Magic number 7 ± 2 in networks of threshold dynamics 
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Information processing by random feed-forward networks consisting of units with sigmoidal input- 
output response is studied by focusing on the dependence of its outputs on the number of parallel 
paths M. It is found that the system leads to a combination of on/off outputs when M < 7, while for 
M > 7, chaotic dynamics arises, resulting in a continuous distribution of outputs. This universality 
of the critical number M ~ 7 is explained by combinatorial explosion, i.e., dominance of factorial 
over exponential increase. Relevance of the result to the psychological magic number 7 ± 2 is briefly 
discussed. 
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Information processing (IP) in biological systems is 
often carried out by elements that show threshold-type 
(sigmoid) input-output (10) behaviors. For example, the 
expression of a gene is determined by an on/ofF-switch for 
its transcription factors. Another example is the signal 
transduction system in a cell where the enzymatic re- 
action often displays a sigmoid output when responding 
to an external stimulus like the abundance of an input 
chemically. One of the most well-studied examples of 
sigmoid 10 relationships can be found in neural networks, 
where the output of each neuron depends on inputs from 
other neurons or a receptive field (3 . 

In these biological networks, the connections among 
elements are often entangled. Cross-talk in signal trans- 
duction has recently been observed for several systems 0, 
while enzymatic reactions are generally entangled as well. 
The connections in neural networks are known to be com- 
plex. Besides complexity, biological networks can also 
display cascade type structures leading from the exter- 
nal input to the final output. In signal transduction such 
cascades are encountered, while layered networks have 
been discussed as an idealization of biological neural net- 
works. Hence, the study of entangled layered networks is 
generally important. The information processing in such 
systems is discussed by judging whether distinct attract- 
ing points (or sets) are reached through the dynamics in 
successive layers, depending on the input. In a layered 
network system, e.g., the attracting set is the state of the 
output layer. 

In an entangled network, the more the number of de- 
grees of freedom to be processed increases, the more mu- 
tual interference can occur thus increasing the complex- 
ity. Consequently, the IP ability of the network depends 
on the number of processed degrees of freedoms. In this 
paper, we discuss this number dependence and explore 
some universal properties of entangled networks with sig- 
moid units. 
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In connection with this problem, it is interesting to 
note that the term magic number 7 ± 2 was coined in 
psychology Pf , where the number of chunks (items) that 
can be memorized in short term memory is found to be 
limited to about 7 ± 2. In neural networks, this corre- 
sponds to the number of inputs beyond which the output 
that depends on these inputs no longer clearly separates. 

In order to investigate the question raised above, we 
adopt a cascade perceptron as an abstract model of a 
random sigmoid response 0, Here we consider feed- 
forward network dynamics without feed-back loop, for 
simplicity. Each layer I is composed of M elements and 
all elements are regulated by the elements in the preced- 
ing layer: 

a:^=tanh(^-^^44-i-0^y (1) 

where x\ represents the state of the i-th element of the 
Z-th layer. 9\ is the threshold value for x\ to be 'excita- 
tory'. Unless otherwise stated, we set 0\ = as this spe- 
cific choice is not important for the later discussion. The 
coupling terms e'j, are chosen randomly from a Gaussian 
distribution with standard deviation 1.0. The parame- 
ter (3 normalized by \/M determines the steepness of the 
sigmoid function. As (3 approaches 0, Eq.lQJ approaches 
a constant function with a; ■ = (or x[ = —9\). On the 
other hand, as (3 increases, equation (1) approaches a 
step functions such that in almost all cases x\ becomes 
either —1 or 1, and each element effectively has just 2 
states. For the medium range of /3, the 10 relationship is 
smooth, which, as will be shown later, may lead to com- 
plex dynamics. Note that if all the thresholds 9\ = 0, 
the change of sign x — s- — a; preserves the equations of the 
system, so that the solutions for Eq.JQ) are symmetric. 

For the information processing stages carried out 
in each layer, = Fi{x^^^) holds where x' = 
{x{,X2, • • • , x\.j) and Fi is the processing dynamics car- 
ried out in the l-th layer. We set the values of the 0-th 
layer as the inputs to be processed. If a succeeding layer 
is regarded as a next time step, the present system can 
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be interpreted as a random dynamical system'6j where 
the processing corresponds to temporal evolution. We 
take various inputs (corresponding to a set of inputs) 
randomly chosen such that € [—1,1] f^or each i, and 
compute numerically the evolution of Eq. . 

Let us first discuss the qualitative behavior of Eq. ^ . 
For 9l = 0, x\ the outputs converge to 0, irrespectively 
of the inputs if /? < 1, while they approach either 1 or 
— 1 when f3 is large. For middle range values of /3, out- 
puts may take values between -1 and 1, depending on 
the input. In this regime, outputs are often sensitive to 
changes of the inputs, and indeed orbital instability ex- 
ist in the evolution through the layers. The degree of 
this instability depends on the number of parallel paths 
and on interference. For sufficiently small M{< 7), the 
IP is stable, in the sense that the xl in the output layer 
only assume a few distinct values, depending on the input 
values. On the other hand, if M is large (> 7), such con- 
vergence is not common. We have computed a histogram 
of the output values P{x\) sampled over 5.0 x 10^ ran- 
domly chosen inputs. As shown in Fig^ there are clearly 
two peaks at x = ±1 for M = 6 and P{x) « for x 7^ 1, 
while for M = 8, the distribution is broad. 

The scattering in the values x\ of the attractor for 
the latter case is due to chaotic dynamics in Eq.ffl, 
where stretching and folding in phase space appear [7|. 
Fig|2a,b) show values of cc' projected onto the {x{,X2) 
plane. In the plot we take I — 30 and 10^ inputs given 
at Z = 0. For M = 6(a) each x\ at I ~ 30 is local- 
ized within a small volume of the total phase space. On 
the other hand, for M = 8(b), one can see folding and 
stretching, and the scattering of points throughout phase 
space. With these chaotic dynamics, tiny differences in 
the input values are amplified making clear separation of 
inputs impossible. 

The above simulations are carried out with (3 — 3.0, 
but this stretching and folding process is observed as long 
as P{> 1). To obtain insight into the dependence on (3, 
values of x^^'^" for 800 inputs are plotted as a function of 
P in Fig|2| For M = 6, they converge to a few points for 
a large portion of /3, while for (b) M = 8 they do not. 

These numerical results suggest that a critical number 
of parallel paths Mc exists, beyond which chaotic dynam- 
ics is inevitable, and that Mc is around 7 for a wide range 
of (3. We confirm this critical number by computing sev- 
eral characteristic quantities for the model Eq.(^. 

First, we plot the fraction of bins for which P{x\) is not 
zero in each layer in Fig^Ja). Here P{x[) is computed 
over 5.0 x 10^ inputs, by taking a bin size of 2.0/128. For 
M < 7 the fraction becomes smaller from layer to layer, 
while for M > 8 the fraction is almost one and does 
not decrease much for successive layers. For M < 7, the 
output points are well separated by the sigmoid function, 
while they are scattered over the whole range of values 
[-1,1] for M > 8. The data for Figl^a) are obtained 
for a fixed threshold 9i — 0, but the conclusion does 
not change even when the thresholds are distributed, as 
shown in Fig0fb) where di € [0,0.5]. These behaviors 



are also invariant against changes in /?, as long as it is 
sufficiently larger than 1 but not that large for the tanh 
function to effectively become a step function. Hence 
the critical number Mc « 7 is rather general, without 
dependencies on the details of the model. 

Second, we have computed the degree of orbital in- 
stability in the chaotic dynamics, i.e., the sensitivity on 
input values. By regarding a layer as a time step in a dy- 
namical system, the sensitivity is computed by the Lya- 
punov exponent of the random map Eq.JQ) as follows [g: 

Xmax = max y In I j' j'-i • • • J^x\ (2) 

2:,|a:| — 1 / 

where is the Jacobian matrix of Eq.©, Jl. = -^rh-, so 

that Sx'-j — Jik{x''~^) (^sjjjT^. The fraction of the net- 
work having positive exponents Xmax is plotted in Fig[51 
for the following three cases: 9l = with a Gaussian dis- 
tribution for e'j,, 6^ = with a uniform distribution for 
el). S [—1, 1], and distributed thresholds 9ij e [0, 0.5] with 
a Gaussian distribution for e' j, . For all of the three cases, 
the fraction of networks with chaotic behavior drastically 
increases around M — 7. 

Loss of separability of inputs around the number 7 due 
to chaotic dynamics is not limited to the model inves- 
tigated above. We have also investigated some other 
models consisting of units with threshold dynamics (of 
Michaelis-Menten's form for enzymatic reactions), that 
are randomly connected in a cascadej^. The same be- 
havior with the same critical number 7 is obtained. On 
the other hand, it is also interesting to note that Mil- 
nor attractors that collide with their basin boundary are 
dominant for globally coupled dynamical systems with 
more than 7 ± 2 degrees of freedom[ll, 12]. 

Then, why is the critical number 7 (or 7 ± 2) so uni- 
versal? In jlj, one of the authors (KK) discussed the 
possibility that the combinatorial explosion of the basin 
boundaries due to chaotic dynamics is relevant to this 
critical number (i.e., the faster increase of N\ over 2''^). 
This combinatorial argument can be extended to the 
present problem. 

We do so by considering the origin of the folding pro- 
cess. In order to see the effect of entanglement, we 
study the input-output relationship of x^ —> xf of a 
two-layer system 14] by fixing the inputs of 1 ' ' ' i ■ 
Then output is given as a function of x^; xf — 
tanh(^^- (Tj(a;° — Vj)) where aj{u) — f3elj tanh{(3e^iu) 

and ~ {^''ji)^^ J2kL2^jk^k- Here it is assumed that 
(3 = f3/^/M is large, and that tanh{x) is close to a 
step function. Note that there are N paths via the 
middle layer elements where xj switches between the 
values -1 and 1 as x\ crosses the 'threshold' Vj. One 
can then renumber the index j — 1, • • • ,M such that 
— l<vi<V2<--< < 1- With this ordering, if 
CTj is positive and cTj+i is negative, the one-dimensional 
mapping x\ x\ has a single hump at Vj < x < Vj+i 
implying a folding process as in the logistic map. Then, if 
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the sign of aj alternates for successive j, the above func- 
tion switches between —1 and 1 at i^j M times as is 
increased; The one-dimensional mapping from the input 
Xi to the output Xi is thus subject to this folding process 
everywhere in —1 < x < 1. Since e^j can take positive or 
negative values with equal probability, the probability to 
have full folding decreases proportionally to 2^^^. 

The estimate given so far is for fixed inputs of 
X2,X3,--- ,xm- By changing these input values, the 
ordering of changes accordingly (for the original in- 
dex without reordering) and hence there are in total 
(M — 1)! possible orderings. Therefore, roughly speaking, 
the input-output relationship has full up-down switches 
for some input values X2, X3, • • • , xm, when (Af — 1)!2^*^ 
exceeds 1. In this case, at every layer, for any element, 
the folding occurs fully for some inputs, and the folding 
process covers most of phase space. Even though this 
argument is quite rough, it is still possible to presume 
that when (M — 1)! exceeds the order of 2*^, the chaotic 
dynamics replaces the separation by the threshold func- 
tion. Note that this factorial surpasses 2*^ at M = 6 
coinciding with the magic number 7 ± 2. This could be 
the reason why at the magic number 7 ±2, the separation 
of states collapses and chaotic dynamics takes over. 



In the present paper, we have shown that the interfer- 
ence between inputs drastically increases around M ^ 7 
within the general setup of neural networks. The argu- 
ment of magic number 7 ± 2 presented here is only based 
on combinatorial arguments, and does not strongly de- 
pend on the choice of parameters. Hence it is naturally 
expected that our explanation works for a wide class of 
entangled cascade networks with sigmoid units. Consid- 
ering also the generality of the mechanism, it is not par- 
ticularly far fetched to infer a correspondence between 
our result and the original magic number 7±2 in psychol- 
ogy. Of course, at present the underlying neurodynamics 
associated with the actual psychological process is still 
unknown, and hence we do not claim that we have found 
the solution to the magic number 7 problem^^. Never- 
theless, the formation of distinct attracting sets resulting 
from inputs channeled through layered networks with sig- 
moidal elements is common among neural processes, and 
hence it is important to mention the connection. 
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FIG. 1: Distribution of P{x{) at layer/ = 30, obtained over 
5.0 X 10^ inputs with a homogeneous distribution over [—1, 1]. 
The parameter /3 = 3.0 and the threshold 6 is set to 0. The 
histogram is plotted by using a bin size 2.0/128. (a) for M — 6 
(b) M = 8. 
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FIG. 2: Snapshots of {x{,X2) at layer I = 30, plotted as a 
projection of a;' on to the two dimensional plane {xi,X2). The 
data obtained from 10^ randomly chosen inputs are overlaid 
with P = 3.0. (a) M — 6. The density of points is quite high 
around (±1, ±1). (b) M = 8. Multiple folding and stretching 
processes are detected, while the points are scattered over the 
plane. 
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FIG. 3: Bifurcation diagram of Xi^'^'^ against 13. x{ at layer 
i = 30 is plotted over 800 different inputs for each /3. For 
/3(< 1.0), x' converge to xi = 0. (a) M = 6. For most inputs, 
Xi^^" converges to ±1 if /3 is sufficiently larger than 1. (b) for 
M = 8, a;'r^° is scattered over [-1, 1]. 
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FIG. 4: The average fraction of the value x such that 

P{x{ — x) > 0, plotted as a function of M. The histogram is 
computed for 10' inputs over 200 networks (i.e., with different 
choices of e';.), with a bin size 2.0/128. The fraction at layer 
I = 10, 20, 30, 40, 50 is plotted, (a) with threshold 6* ■ = and 
(b) with distributed threshold 9i € [0,0.5]. The parameter /3 
is fixed at 3.0. 
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FIG. 5: The ratio of networks which shows 'chaotic' behav- 
ior for some values of /3, plotted as a function of M. 500 
networks are chosen for e\f.. The Lyapunov exponent is com- 
puted by using j' over 500 steps (layers). The behavior is 
judged numerically as "chaotic" if the exponent is larger than 
0.0. The ratio is computed for the following three cases; the 
threshold 6\ = Q with a Gaussian distribution of e'j. (cross); 
distributed thresholds 9ij G [0, 0.5] with a Gaussian distribu- 
tion of t\^. (box);and 6\ = Q with a uniform distribution for 
e\k € [-1,1] (circle). 



